Who Are My Ancestors? Retrieving Family Relationships from Historical Texts

نویسندگان

  • Julia Efremova
  • Alejandro Montes García
  • Alfredo Bolt Iriondo
  • Toon Calders
چکیده

This paper presents an approach for automatically retrieving family relationships from a real-world collection of Dutch historical notary acts. We aim to retrieve relationships like husband wife, parent child, widow of, etc. Our approach includes person names extraction, reference disambiguation, candidate generation and family relationship prediction. Since we have a limited amount of training data, we evaluate different feature configurations based on the n-gram analysis. The best results were obtained by using a combination of bi-grams and trigrams of words together with the distance in words between two names. We evaluate our results for each type of the relationships in terms of precision, recall and f − score.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Bibliographic Databases in Retrieving Information on Telemedicine

Background & Aims: Some of the main questions which can be of importance for those researchers who intend to perform a systematic review in a field of science are: ‘What databases should I use for my review?’; ‘Do all these databases have the same value?’; and ‘Which sourcesretrieved the highest of relevant references?’. The main aim of this work was the identification of the best database for ...

متن کامل

"Lungisa"-weaving relationships and social space to restore health in rural KwaZulu Natal.

Many Zulu people who live in big cities in South Africa return to their rural homestead when they fall ill. Although the health care offered in rural areas is not efficient, people wish to connect to their family and ancestors. My aim is to explore acts of lungisa ("to put in order") and what they say about health, agency, and the circumstances under which people live. Returning home means weav...

متن کامل

Harvesting Indices to Grow a Controlled Vocabulary: Towards Improved Access to Historical Legal Texts

We describe ongoing work aiming at deriving a multilingual controlled vocabulary (German, French, Italian) from the combined subject indices from 22 volumes of a large-scale critical edition of historical documents. The controlled vocabulary is intended to support editors in assigning descriptors to new documents and to support users in retrieving documents of interest regardless of the spellin...

متن کامل

A System for Identifying and Exploring Text Repetition in Large Historical Document Corpora

We present a software for retrieving and exploring duplicated text passages in low quality OCR historical text corpora. The system combines NCBI BLAST, a software created for comparing and aligning biological sequences, with the Solr search and indexing engine, providing a web interface to easily query and browse the clusters of duplicated texts. We demonstrate the system on a corpus of scanned...

متن کامل

“Amir-e- dad”: A Governmental Position in Some of Persian Literary and Historical Texts

Governmental systems and administrative organizations in Iran have been always in change throughout history. A position which was necessary in some centuries and was counted as an important governmental position was “Amirdadi”. The person who held this post was called Amir-e dad or Mirdad. Lexicographers have mentioned several different tasks for Amir-e dad. Some believe they are th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015